Novel Selection Methods for Monte-carlo Tree Search

نویسنده

Tom Pepels

چکیده

Preface In this thesis I present the result of my investigation into regret minimization for Monte-Carlo Tree Search. The thesis presents the motivation, background, and formal definition of a novel search technique based on minimizing both simple and cumulative regret in a game tree: Hybrid MCTS (H-MCTS). The technique minimizes the two types of regret in a single search-tree. This ensures that recommendations made by the algorithm have a low simple regret, and at the same time internal nodes are sampled efficiently. It was developed for, and tested in six two-Special thanks goes to both Dr. Mark Winands and Dr. Marc Lanctot for providing the inspiration and guidance required to develop this novel algorithm. Their combined experience was crucial to obtain the results presented in this work. Thanks goes to Prof. Dr. Tristan Cazenave for his time and assistance with the implementation of SHOT, and for the experiments he performed in his award-winning engine. Thanks also to Dr. Steve Kroon for his insightful input and assistance in proofreading the work. Moreover , I would like to thank my wife Priscilla for her support, and for her patience and understanding. Without both her emotional and financial assistance you would not be reading this thesis. Summary Monte-Carlo Tree Search (MCTS) is a best-first search technique, which bases decisions on sampling the state-space of a domain. In different domains, MCTS has proven to be an effective approach when complex decision-making based on future rewards and outcomes is required. The technique was initially inspired by algorithms used to solve multi-armed bandit (MAB) problems. Such a problem can be described as a single-ply MCTS search, in which an agent is given a choice of options (arms), each with their own probability distribution. Sampling an arm returns a random result from its underlying distribution, and the goal of the agent is to maximize its reward and/or provide a recommendation of which arm has the most rewarding distribution. Based on the context of the MAB problem, the agent's goal is to either minimize simple regret, i.e., the regret of not recommending the best action, or cumulative regret, i.e., regret accumulated over time. Applying this theory to MCTS however, may require more consideration. In a recursive MAB (such as MCTS), where the distribution of each arm is based on an underlying growing search-tree, minimizing a single type of regret throughout the tree implies that at each ply of …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cooperative Games with Monte Carlo Tree Search

Monte Carlo Tree Search approach with Pareto optimality and pocket algorithm is used to solve and optimize the multi-objective constraint-based staff scheduling problem. The proposed approach has a two-stage selection strategy and the experimental results show that the approach is able to produce solutions for cooperative games.

متن کامل

Monte-Carlo Tree Search: Applied to Domineering and Tantrix

................................................................................................................................................... i Chapter 1: Introduction ........................................................................................................................... 1 The Rules of Tantrix ...............................................................................

متن کامل

Monte-Carlo Exploration for Deterministic Planning

Search methods based on Monte-Carlo simulation have recently led to breakthrough performance improvements in difficult game-playing domains such as Go and General Game Playing. Monte-Carlo Random Walk (MRW) planning applies MonteCarlo ideas to deterministic classical planning. In the forward chaining planner ARVAND, MonteCarlo random walks are used to explore the local neighborhood of a search ...

متن کامل

Thompson Sampling Based Monte-Carlo Planning in POMDPs

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the tradeoff between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm — Dirichlet-Di...

متن کامل

Stochastic Planning in Large Search Spaces

Multi-agent planning approaches are employed for many problems including task allocation, surveillance and video games. In the first part of my thesis, we study two multi-robot planning problems, i.e. patrolling and task allocation. For the patrolling problem, we present a novel stochastic search technique, Monte Carlo Tree Search with Useful Cycles, that can generate optimal cyclic patrol poli...

متن کامل

Metareasoning for Monte Carlo Tree Search

Sequential decision problems are often approximately solvable by simulating possible future action sequences; such methods are a staple of gameplaying algorithms, robot path planners, model-predictive control systems, and logistical planners in operations research. Since the 1960s, researchers have sought effective metareasoning methods for selecting which action sequences to simulate, basing t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Novel Selection Methods for Monte-carlo Tree Search

نویسنده

چکیده

منابع مشابه

Cooperative Games with Monte Carlo Tree Search

Monte-Carlo Tree Search: Applied to Domineering and Tantrix

Monte-Carlo Exploration for Deterministic Planning

Thompson Sampling Based Monte-Carlo Planning in POMDPs

Stochastic Planning in Large Search Spaces

Metareasoning for Monte Carlo Tree Search

عنوان ژورنال:

اشتراک گذاری